skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Xu, Jun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Large language models (LLMs) are widely used in software development. However, the code generated by LLMs often contains vulnerabilities. Several secure code generation methods have been proposed to address this issue, but their current evaluation schemes leave several concerns unaddressed. Specifically, most existing studies evaluate security and functional correctness separately, using different datasets. That is, they assess vulnerabilities using securityrelated code datasets while validating functionality with general code datasets. In addition, prior research primarily relies on a single static analyzer, CodeQL, to detect vulnerabilities in generated code, which limits the scope of security evaluation. In this work, we conduct a comprehensive study to systematically assess the improvements introduced by four state-of-the-art secure code generation techniques. Specifically, we apply both security inspection and functionality validation to the same generated code and evaluate these two aspects together. We also employ three popular static analyzers and two LLMs to identify potential vulnerabilities in the generated code. Our study reveals that existing techniques often compromise the functionality of generated code to enhance security. Their overall performance remains limited when evaluating security and functionality together. In fact, many techniques even degrade the performance of the base LLM by more than 50%. Our further inspection reveals that these techniques often either remove vulnerable lines of code entirely or generate “garbage code” that is unrelated to the intended task. Moreover, the commonly used static analyzer CodeQL fails to detect several vulnerabilities, further obscuring the actual security improvements achieved by existing techniques. Our study serves as a guideline for a more rigorous and comprehensive evaluation of secure code generation performance in future work. 
    more » « less
  2. Internal soil erosion caused by water infiltration around defective buried pipes poses a significant threat to the long-term stability of underground infrastructures such as pipelines and highway culverts. This study employs a coupled computational fluid dynamics–discrete element method (CFD–DEM) framework to simulate the detachment, transport, and redistribution of soil particles under varying infiltration pressures and pipe defect geometries. Using ANSYS Fluent (CFD) and Rocky (DEM), the simulation resolves both the fluid flow field and granular particle dynamics, capturing erosion cavity formation, void evolution, and soil particle transport in three dimensions. The results reveal that increased infiltration pressure and defect size in the buried pipe significantly accelerate the process of erosion and sinkhole formation, leading to potentially unstable subsurface conditions. Visualization of particle migration, sinkhole development, and soil velocity distributions provides insight into the mechanisms driving localized failure. The findings highlight the importance of considering fluid–particle interactions and defect characteristics in the design and maintenance of buried structures, offering a predictive basis for assessing erosion risk and infrastructure vulnerability. 
    more » « less
  3. Large language models (LLMs) have demonstrated revolutionary capabilities in understanding complex contexts and performing a wide range of tasks. However, LLMs can also answer questions that are unethical or harmful, raising concerns about their applications. To regulate LLMs' responses to such questions, a training strategy called alignment can help. Yet, alignment can be unexpectedly compromised when fine-tuning an LLM for downstream tasks. This paper focuses on recovering the alignment lost during fine-tuning. We observe that there are two distinct directions inherent in an aligned LLM: the aligned direction and the harmful direction. An LLM is inclined to answer questions in the aligned direction while refusing queries in the harmful direction. Therefore, we propose to recover the harmful direction of the fine-tuned model that has been compromised. Specifically, we restore a small subset of the fine-tuned model's weight parameters from the original aligned model using gradient descent. We also introduce a rollback mechanism to avoid aggressive recovery and maintain downstream task performance. Our evaluation on 125 fine-tuned LLMs demonstrates that our method can reduce their harmful rate (percentage of answering harmful questions) from 33.25% to 1.74%, without sacrificing task performance much. In contrast, the existing methods either only reduce the harmful rate to a limited extent or significantly impact the normal functionality. Our code is available at https://github.com/kangyangWHU/LLMAlignment 
    more » « less
  4. A multiphysics study evaluates the mechanical–electrochemical–thermal response and fundamental mechanisms of SIBs under mechanical abuse, explores key safety parameters, and compares the safety of SIBs and LIBs under mechanical loading. 
    more » « less
  5. This paper focuses on fuzzing document software or precisely, software that processes document files (e.g., HTML, PDF, and DOCX). Document software typically requires highly-structured inputs, which general-purpose fuzzing cannot handle well. We propose two techniques to facilitate fuzzing on document software. First, we design an intermediate document representation (DIR) for document files. DIR describes a document file in an abstract way that is independent of the underlying format. Reusing common SDKs, a DIR document can be lowered into a desired format without a deep understanding of the format. Second, we propose multi-level mutations to operate directly on a DIR document, which can more thoroughly explore the searching space than existing single-level mutations. Combining these two techniques, we can reuse the same DIR-based generations and mutations to fuzz any document format, without separately handling the target format and re-engineering the generation/mutation components.To assess utility of our DIR-based fuzzing, we applied it to 6 PDF and 6 HTML applications (48-hour) demonstrated superior performance, outpacing general mutation-based fuzzing (AFL++), ML-based PDF fuzzing (Learn&Fuzz), and structure-aware mutation-based fuzzing ((NAUTILUS) by 33.87%, 127.74%, and 25.17% in code coverage, respectively. For HTML, it exceeded AFL++ and generation-based methods (FreeDom and Domato) by 28.8% and 14.02%. 
    more » « less